This project shows interesting findings about the correlation between happiness (rated on a scale from 1-10) and its relationship between various employment sectors in countries around the world.
Our data was supplied from the International Labour Organization, Kaggle, UN, Google Developers, and World Bank
The international Labour Organization website provided us the datasets for Employment by Sector and Status in Employment. Using the data.world query functions, we subset the data from the year 2012 for both of these and joined them together.
Kaggle provided the World Happiness data. This dataset gives a rank by happiness for each country, the region in which the country is in, and a happiness score. The scores are based on the answer to a question asked in the Gallup World Poll. Quoting from Data.World, “This question, known as the Cantril ladder, asks respondents to think of a ladder with the best possible life for them being a 10 and the worst possible life being a 0 and to rate their own current lives on that scale.”
We needed one more categorical variable but could not find any suitable datasets. The UN report on “Country Classification” gave a list of countries determined to be in a category of either: Low Income, Lower Middle Income, Upper Middle Income, High Income. There was not an easily downloadable CSV for this so we created one in Excel. This was not run through an ETL file as there was nothing to clean up as it was only two columns - Country and Income Class. We joined this dataset with the joined employment dataset.
The Google Developers site provided the longitude and latitude data for each country; we then joined this dataset with the previously joined dataset from above.
Finally, to make the data more readable and create a calculated field that calculates the percentage in each sector (Agriculture, Industry, and Service), we had to find the population data for each country. This came form the World Bank website. This dataset was, again, joined with the dataset that has all the other joins.
Our final join was combining the dataset that had all of the above with the World Happiness datset. This was accomplished through an outer join, linking by Country name.
All joins were completed in data.world, with one exception. We joined the World Happiness dataset with the large conglomeration above in Tableau as well.
Below we display our sessionInfo().
sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] grid stats graphics grDevices utils datasets
[7] methods base
other attached packages:
[1] cowplot_0.7.0 gridExtra_2.2.1 lubridate_1.6.0
[4] leaflet_1.1.0 tidyr_0.6.1 reshape2_1.4.2
[7] readr_1.1.0 data.world_0.1.2 DT_0.2
[10] plotly_4.6.0 RCurl_1.95-4.8 bitops_1.0-6
[13] ggplot2_2.2.1 dplyr_0.5.0 shinydashboard_0.5.3
[16] shiny_1.0.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.10 plyr_1.8.4 tools_3.3.2
[4] digest_0.6.11 viridisLite_0.2.0 jsonlite_1.3
[7] tibble_1.2 gtable_0.2.0 DBI_0.5-1
[10] crosstalk_1.0.0 curl_2.4 yaml_2.1.14
[13] knitr_1.15.1 stringr_1.2.0 httr_1.2.1
[16] htmlwidgets_0.8 sourcetools_0.1.5 hms_0.3
[19] R6_2.2.0 purrr_0.2.2 magrittr_1.5
[22] scales_0.4.1 htmltools_0.3.5 rsconnect_0.7
[25] assertthat_0.1 mime_0.5 xtable_1.8-2
[28] colorspace_1.3-2 httpuv_1.3.3 labeling_0.3
[31] stringi_1.1.2 lazyeval_0.2.0 munsell_0.4.3
Cleaning Up World Happiness Dataset
This data was not difficult to clean up and remained largey untouched.
source("../01 Data/R_ETL_World_Happiness.R")
Cleaning Up Joined Employment Dataset This dataset also did not require a lot of cleaning; the joining and subsetting happened in Data.World. There were no commas, strange column headings, etc.
source("../01 Data/R_ETL_GET_status_sector_2012.R")
Parsed with column specification:
cols(
.default = col_integer(),
Country = col_character(),
`Total employment in agriculture (thousands)` = col_number(),
`Male employment in agriculture (thousands)` = col_number(),
`Female employment in agriculture (thousands)` = col_number(),
`Total employment in industry (thousands)` = col_number(),
`Male employment in industry (thousands)` = col_number(),
`Female employment in industry (thousands)` = col_number(),
`Total employment in services (thousands)` = col_number(),
`Male employment in services (thousands)` = col_number(),
`Female employment in services (thousands)` = col_number()
)
See spec(...) for full column specifications.
Classes tbl_df, tbl and 'data.frame': 188 obs. of 26 variables:
$ Year : int 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
$ Country : chr "Sweden" "United Kingdom" "Kyrgyzstan" "Iraq" ...
$ Total employment in agriculture (thousands) : num 98 362 746 1330 35 70 0 97 66 21 ...
$ Male employment in agriculture (thousands) : num 75 262 423 836 33 56 0 78 65 21 ...
$ Female employment in agriculture (thousands): num 23 100 323 493 3 14 0 20 1 0 ...
$ Total employment in industry (thousands) : num 918 5667 516 1337 292 ...
$ Male employment in industry (thousands) : num 746 4619 365 1301 271 ...
$ Female employment in industry (thousands) : num 172 1048 150 37 21 ...
$ Total employment in services (thousands) : num 3653 23543 1091 4319 1346 ...
$ Male employment in services (thousands) : num 1631 11037 574 3741 1096 ...
$ Female employment in services (thousands) : num 2023 12506 517 578 250 ...
$ Wage and salaried workers : int 4179 25213 1269 4658 1407 2451 1550 983 1383 1515 ...
$ Employers : int 175 754 37 595 104 96 17 85 16 12 ...
$ Own-account workers : int 302 3485 632 1321 155 144 31 498 45 7 ...
$ Contributing family workers : int 12 119 414 413 7 7 7 48 22 0 ...
$ Vulnerable employment : int 314 3605 1046 1733 162 151 39 546 67 7 ...
$ Wage and salaried workers(2) : int 2095 12890 800 4138 1144 1244 1174 694 1199 1342 ...
$ Employers(2) : int 138 561 28 578 100 75 14 81 15 12 ...
$ Own-account workers(2) : int 214 2416 373 891 150 98 28 446 31 7 ...
$ Contributing family workers(2) : int 5 50 161 270 6 2 5 31 11 0 ...
$ Vulnerable employment(2) : int 219 2466 533 1161 156 100 33 477 43 7 ...
$ Wage and salaried workers(3) : int 2084 12323 469 520 263 1207 376 288 184 173 ...
$ Employers(3) : int 38 193 8 16 4 21 3 4 1 0 ...
$ Own-account workers(3) : int 89 1069 260 430 6 46 3 52 13 0 ...
$ Contributing family workers(3) : int 7 70 253 143 1 5 2 18 11 0 ...
$ Vulnerable employment(3) : int 96 1139 513 572 6 51 5 69 25 0 ...
- attr(*, "spec")=List of 2
..$ cols :List of 26
.. ..$ Year : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Country : list()
.. .. ..- attr(*, "class")= chr "collector_character" "collector"
.. ..$ Total employment in agriculture (thousands) : list()
.. .. ..- attr(*, "class")= chr "collector_number" "collector"
.. ..$ Male employment in agriculture (thousands) : list()
.. .. ..- attr(*, "class")= chr "collector_number" "collector"
.. ..$ Female employment in agriculture (thousands): list()
.. .. ..- attr(*, "class")= chr "collector_number" "collector"
.. ..$ Total employment in industry (thousands) : list()
.. .. ..- attr(*, "class")= chr "collector_number" "collector"
.. ..$ Male employment in industry (thousands) : list()
.. .. ..- attr(*, "class")= chr "collector_number" "collector"
.. ..$ Female employment in industry (thousands) : list()
.. .. ..- attr(*, "class")= chr "collector_number" "collector"
.. ..$ Total employment in services (thousands) : list()
.. .. ..- attr(*, "class")= chr "collector_number" "collector"
.. ..$ Male employment in services (thousands) : list()
.. .. ..- attr(*, "class")= chr "collector_number" "collector"
.. ..$ Female employment in services (thousands) : list()
.. .. ..- attr(*, "class")= chr "collector_number" "collector"
.. ..$ Wage and salaried workers : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Employers : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Own-account workers : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Contributing family workers : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Vulnerable employment : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Wage and salaried workers(2) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Employers(2) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Own-account workers(2) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Contributing family workers(2) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Vulnerable employment(2) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Wage and salaried workers(3) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Employers(3) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Own-account workers(3) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Contributing family workers(3) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Vulnerable employment(3) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
..$ default: list()
.. ..- attr(*, "class")= chr "collector_guess" "collector"
..- attr(*, "class")= chr "col_spec"
Classes tbl_df, tbl and 'data.frame': 188 obs. of 26 variables:
$ Year : Factor w/ 1 level "2012": 1 1 1 1 1 1 1 1 1 1 ...
$ Country : Factor w/ 188 levels "Afghanistan",..: 163 178 90 80 86 46 89 94 129 139 ...
$ Total employment in agriculture (thousands) : Factor w/ 175 levels "0","10","102",..: 172 96 150 19 94 148 1 170 143 60 ...
$ Male employment in agriculture (thousands) : Factor w/ 175 levels "0","1","100",..: 155 79 110 163 100 130 1 160 143 65 ...
$ Female employment in agriculture (thousands): Factor w/ 149 levels "0","1","10","100",..: 60 4 83 102 76 28 1 51 2 1 ...
$ Total employment in industry (thousands) : Factor w/ 175 levels "100","1016","1019",..: 168 128 120 22 81 124 81 94 143 159 ...
$ Male employment in industry (thousands) : Factor w/ 178 levels "1015","104","108",..: 151 117 98 16 77 109 75 87 139 156 ...
$ Female employment in industry (thousands) : Factor w/ 151 levels "10","1048","105",..: 38 2 29 89 52 12 66 69 4 124 ...
$ Total employment in services (thousands) : Factor w/ 186 levels "102","108","1091",..: 115 80 3 129 27 69 22 12 167 159 ...
$ Male employment in services (thousands) : Factor w/ 181 levels "101","1015","104",..: 35 7 126 93 6 178 179 167 131 121 ...
$ Female employment in services (thousands) : Factor w/ 177 levels "102","10298",..: 58 19 122 128 74 13 100 90 55 44 ...
$ Wage and salaried workers : Factor w/ 182 levels "1021","1024",..: 122 83 21 131 29 80 45 182 26 39 ...
$ Employers : Factor w/ 143 levels "1","10","1005",..: 39 127 90 114 7 143 36 136 31 16 ...
$ Own-account workers : Factor w/ 179 levels "10","102","1034",..: 96 111 151 22 33 30 99 142 137 154 ...
$ Contributing family workers : Factor w/ 153 levels "0","1","10","104426",..: 18 17 98 97 130 130 130 105 59 1 ...
$ Vulnerable employment : Factor w/ 182 levels "10","1001","10064",..: 90 101 6 38 32 27 114 147 160 163 ...
$ Wage and salaried workers(2) : Factor w/ 185 levels "101084","1012",..: 59 25 164 109 13 24 14 148 16 30 ...
$ Employers(2) : Factor w/ 131 levels "0","1","100",..: 20 97 65 99 3 115 21 122 26 11 ...
$ Own-account workers(2) : Factor w/ 180 levels "10","10489","1050",..: 74 84 115 171 33 179 94 121 106 153 ...
$ Contributing family workers(2) : Factor w/ 133 levels "0","1","10","11",..: 100 101 27 54 111 39 100 62 4 1 ...
$ Vulnerable employment(2) : Factor w/ 180 levels "10","100","101",..: 77 85 143 8 36 2 105 132 127 155 ...
$ Wage and salaried workers(3) : Factor w/ 173 levels "10","1037","104",..: 57 15 111 115 68 13 92 73 47 39 ...
$ Employers(3) : Factor w/ 94 levels "0","1","10","100",..: 55 23 88 16 57 27 46 57 2 1 ...
$ Own-account workers(3) : Factor w/ 169 levels "0","10","100",..: 161 9 69 100 125 104 74 116 25 1 ...
$ Contributing family workers(3) : Factor w/ 132 levels "0","1","100",..: 112 113 50 24 2 89 39 35 5 1 ...
$ Vulnerable employment(3) : Factor w/ 172 levels "0","1","100",..: 170 15 125 134 137 123 122 148 77 1 ...
- attr(*, "spec")=List of 2
..$ cols :List of 26
.. ..$ Year : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Country : list()
.. .. ..- attr(*, "class")= chr "collector_character" "collector"
.. ..$ Total employment in agriculture (thousands) : list()
.. .. ..- attr(*, "class")= chr "collector_number" "collector"
.. ..$ Male employment in agriculture (thousands) : list()
.. .. ..- attr(*, "class")= chr "collector_number" "collector"
.. ..$ Female employment in agriculture (thousands): list()
.. .. ..- attr(*, "class")= chr "collector_number" "collector"
.. ..$ Total employment in industry (thousands) : list()
.. .. ..- attr(*, "class")= chr "collector_number" "collector"
.. ..$ Male employment in industry (thousands) : list()
.. .. ..- attr(*, "class")= chr "collector_number" "collector"
.. ..$ Female employment in industry (thousands) : list()
.. .. ..- attr(*, "class")= chr "collector_number" "collector"
.. ..$ Total employment in services (thousands) : list()
.. .. ..- attr(*, "class")= chr "collector_number" "collector"
.. ..$ Male employment in services (thousands) : list()
.. .. ..- attr(*, "class")= chr "collector_number" "collector"
.. ..$ Female employment in services (thousands) : list()
.. .. ..- attr(*, "class")= chr "collector_number" "collector"
.. ..$ Wage and salaried workers : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Employers : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Own-account workers : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Contributing family workers : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Vulnerable employment : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Wage and salaried workers(2) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Employers(2) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Own-account workers(2) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Contributing family workers(2) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Vulnerable employment(2) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Wage and salaried workers(3) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Employers(3) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Own-account workers(3) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Contributing family workers(3) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Vulnerable employment(3) : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
..$ default: list()
.. ..- attr(*, "class")= chr "collector_guess" "collector"
..- attr(*, "class")= chr "col_spec"
These visualizations shows that as GDP increases, so does happiness. We also get an indication of the trend of each region in happiness; for example, Australia and New Zealand have a pretty horizontal line. As only two countries are involved, it shows that the disaparity between the two is not high; GDP and Happiness are quite similar for Australia and New Zealand. We also see clustering of countries in the Sub-Saharan Africa region with low happiness scores and low GDPs. Western Europe is on the opposite end of this spectrum with high happiness scores and high GDPs, though there is a fairly large disparity between the happiest country in Europe and the least happy based on the trend line.
These visualizations show that Sub-Saharan Africa has both the largest percentage of people working in the Agriculture sector and the highest variance of those working in this sector. High income regions such as North America, Western Europe, and Australia and New Zealand have the least variance within each group. Also, the sector with the largest percentage of people working within it for these countries is the Services sector with very few working in the Agriculture sector. The Middle East and Africa has large variance within each group.
The barcharts show us that Sub-Saharan Africa is the least happy within each Income Class. Overall, Australia and New Zealand are the happiest. The line shows us the average happiness score of the region. Latin America and the Caribbean are the happiest for both the Upper Middle Income and Lower Middle Income.
In this visualization, things are a little reversed; as we are averaging the happiness rank, a higher rank means that a country/region is less happy on a whole. Each number is the average happiness of the countries who are in this region and whose main sector it is either industry, agriculture, or services. The sector with the lowest happiness score, on average, are the companies whose largest sector is agriculture. The only region that has a country whose main sector is Industry is the Middle East/North Africa.
The maps provide shows that the regions of Africa and parts of the Middle East, Eastern Europe, and SouthEastern Asia are the least happy. Northern Europe, most of South America, and North America are largely happy. In the Shiny app you can hover over each marker to find the country and then select the marker to view the country’s happiness rank and score.